"a"
"a, b,c"
c("a","b","c")Objects in R
2023 Bio R Workshop
An Intuitive Framework
One approach to get a partial yet quick understanding of a complex system of ideas is to have a simplified mental picture of it. This same approach is applied when we want to learn R as its learning is quite steep:
“The learning curve for R programming is steep due to its unique syntax and extensive set of commands, requiring most new learners to spend four to six weeks mastering it.” - Noble Desktop, (NYC’s Top Design & Coding School Since 1990)
A (over)simplified mental picture for beginners of R is to analogize working in R as cooking. Cooking essentially requires three things:
- Ingredients – R objects a.k.a “data containers”
- Cooking utensils/equipments – R functions
- Recipe – R scripts or Markdown files
You can think of RStudio’s Console and Source Panes as the “chef’s” (you) cooking table.
Vectors
Probably the most fundamental object that act as “data container” (i.e. data structure) in R is called a vector (also called atomic vectors). Almost all other objects in R that are used by the common user is built up in terms of vectors. Any vector contains three properties:
- Type -
typeof(), what it is - Length -
length(), how many elements it contains - Attributes -
attributes(), additional arbitrary metadata
Creating vectors could be done in many ways. However, two of most basic ways depends on the length of the vector:
- Length = 1. Directly run a single alphanumeric characters in the Console Pane.
- Length > 1. Use the R combine command
c().
Characters or Strings
typeof("a")
typeof("a, b,c")
typeof(c("a","b","c"))length("a")
length("a, b,c")
length(c("a","b","c"))Numbers
15L
1.0
1 + 2i
c(1L,2L,0L,-15L)
c(1.0,1,4,6,-56,1e-10,1e4)
c(1 + 2i,1,0 - 3i, 3i)typeof(15L)
typeof(1.0)
typeof(1 + 2i)
typeof(c(1L,2L,0L,-15L))
typeof(c(1.0,1,4,6,-56,1e-10,1e4))
typeof(c(1 + 2i,1,0 - 3i, 3i))length(15L)
length(1.0)
length(1 + 2i)
length(c(1L,2L,0L,-15L))
length(c(1.0,1,4,6,-56,1e-10,1e4))
length(c(1 + 2i,1,0 - 3i, 3i))Logical or Boolean
T
F
TRUE
FALSE
c(T,FALSE)
c(T,T,T,T,F,FALSE,F,TRUE,T,FALSE,T)typeof(c(T,FALSE))
length(c(T,FALSE))
attributes(c(T,FALSE))Matrix
# Number of entries matches number of elements
matrix(c(1,2,3,4,5,6,7,8), nrow = 2, ncol = 4)
matrix(c(1,2,3,4,5,6,7,8), nrow = 2, ncol = 4, byrow = TRUE)
# Number of entries does not matche number of elements
# Resolved by recycling elements
matrix(c(1,2,3,4,5,6,7,8), nrow = 2, ncol = 10)
matrix(c(1,2,3,4,5,6,7,8), nrow = 2, ncol = 13, byrow = TRUE)## Example of setting row and column names
matrix(data = c(1,2,3, 11,12,13),
nrow = 2,
ncol = 3,
byrow = TRUE,
dimnames = list(c("row1", "row2"),
c("C.1", "C.2", "C.3")))cbind(c(1,2,3,4), c(5,6,7,8))
rbind(c(1,2,3,4), c(5,6,7,8))cbind(c(1,2,3,4),
c(5,6,7,8),
c("A","B","C","D"))
rbind(c(1,2,3,4),
c(5,6,7,8),
c(T,F,T,T))
rbind(c(143,243),
cbind(c(5,6,7,8),
c(T,F,T,T)))Data Frame
data.frame(
ID = c(1103,1483,5670),
Name = c("Mark","John","Maria"),
Age = c(15L,13L,16L),
BType = c("A","O","B"),
WVaccine = c(T,T,F)
) ID Name Age BType WVaccine
1 1103 Mark 15 A TRUE
2 1483 John 13 O TRUE
3 5670 Maria 16 B FALSE
dplyr::tibble(
ID = c(1103,1483,5670),
Name = c("Mark","John","Maria"),
Age = c(15L,13L,16L),
BType = c("A","O","B"),
WVaccine = c(T,T,F)
)# A tibble: 3 × 5
ID Name Age BType WVaccine
<dbl> <chr> <int> <chr> <lgl>
1 1103 Mark 15 A TRUE
2 1483 John 13 O TRUE
3 5670 Maria 16 B FALSE
Lists
A list a vector in “steroids”. While vectors only allows a single type (logical, numeric, etc) of data, lists allows a mixture of different types of data. In other words, a vector is homogeneous type of container while lists is the heterogeneous type.
c(1,2,3)list(1,2,3)c(1,"A",TRUE,c(5.4,-4.0))list(1,"A",TRUE,c(5.4,-4.0))typeof(list(1,"A",TRUE,c(5.4,-4.0)))
length(list(1,"A",TRUE,c(5.4,-4.0)))attributes(list(1,"A",TRUE,c(5.4,-4.0)))list(Name1 = 1, Name2 = "A", Name3 = TRUE, Name4 = c(5.4,-4.0))typeof(list(Name1 = 1, Name2 = "A", Name3 = TRUE, Name4 = c(5.4,-4.0)))
length(list(Name1 = 1, Name2 = "A", Name3 = TRUE, Name4 = c(5.4,-4.0)))attributes(list(Name1 = 1, Name2 = "A", Name3 = TRUE, Name4 = c(5.4,-4.0)))list(Name1 = 1,
Name2 = "A",
Name3 = TRUE,
Name4 = c(5.4,-4.0))list("Name 1" = 1,
"Name 2" = "A",
"Name 3" = TRUE,
"Name 4" = c(5.4,-4.0))list(`Name 1` = 1,
`Name 2` = "A",
`Name 3` = TRUE,
`Name 4` = c(5.4,-4.0))list(
`A vector` = 1:10,
`A matrix` = matrix(1:9, nrow = 3),
`A list` = list(Name1 = 1,
Name2 = "A",
Name3 = TRUE,
Name4 = c(5.4,-4.0))
)Variables and Constants
In computer programming, a variable is a named memory location where data is stored. Constants are those entities whose values aren’t meant to be changed anywhere throughout the code
x <- c(5,19,-2,0)
xtypeof(x)
length(x)HONEY <- list(1,"A",TRUE,c(5.4,-4.0))
HONEYtypeof(HONEY)
length(HONEY)student_data <- data.frame(
ID = c(1103,1483,5670),
Name = c("Mark","John","Maria"),
Age = c(15L,13L,16L),
BType = c("A","O","B"),
WVaccine = c(T,T,F)
)
student_datatypeof(student_data)
length(student_data)attributes(student_data)The variables x, HONEY, and student_data are stored in the Global Environment through the Environment Pane:
You also list down all the existing variables you have stored in the Global Environment using the ls() command:
ls()There are certain rules that need to be followed while creating a variable and constants:
A variable name in R can be created using letters, digits, periods, and underscores.
You can start a variable name with a letter or a period, but not with digits.
For multi-word variable names, it is advised to underscores in place of spaces. For example,
first_name,student_id, etc.If a variable name starts with a dot, you can’t follow it with digits.
R is case sensitive. This means that
ageandAgeare treated as different variables.We have some reserved words that cannot be used as variable names. These are names that are built-in R and changing them leads to “horrifying” consequences. You are warned!
Special and Built-in R Constants
Special R Constants:
NULL– to declare an empty R object.x <- NULL xx <- c(5,NULL,-6) xInf/-Inf– represents positive and negative infinity or numbers that exceeds the capacity of the machine.Inf -InfNaN(Not a Number) – represents undefined numerical value like0/0orInf/Inf.NaN 0/0 Inf/InfNA(Not Available) – represents values which is not available.
Built-in R Constants:
LETTERS– the 26 upper-case letters of the Roman alphabetLETTERSletters– the 26 lower-case letters of the Roman alphabetlettersmonth.abb– the three-letter abbreviations for the English month namesmonth.abbmonth.name– the English names for the months of the yearmonth.namepi– the constant \(\pi=3.1415927\ldots\), i.e. the ratio of the circumference of a circle to its diameterpi